Prior-predictive value from fast growth simulations 
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Building on a variant of the Jarzynski equation we propose a new method to numerically determine 
the prior-predictive value in a Bayesian inference problem. The method generalizes thermodynamic 
integration and is not hampered by equilibration problems. We demonstrate its operation by apply- 
ing it to two simple examples and elucidate its performance. In the case of multi-modal posterior 
distributions the performance is superior to thermodynamic integration. 

PACS numbers: 02.50.-r, 02.50.Tt, 05.10.Ln 



INTRODUCTION 

Bayesian methods of inference [3, 0, 0] are playing an 
ever growing role in the statistical analysis of data in 
physics and other natural sciences 0, HI, Q • Among its 
particular virtues is the ability to perform model selec- 
tion, i.e. to quantitatively assess the appropriateness of a 
particular model irrespective of concrete parameter val- 
ues. This is accomplished by calculating what is called 
the evidence or the prior-predictive value. 

Building on rather general and essentially simple prin- 
ciples the efficiency of Bayesian methods in practical ap- 
plications depends crucially on the implemented numer- 
ical algorithms. A major difficulty common to Bayesian 
data analysis is the calculation of integrals in high- 
dimensional spaces which are dominated by contributions 
from small and labyrinthine regions. Similar problems 
are typical for the numerical determination of the par- 
tition function in statistical mechanics. It is therefore 
no surprise that some of the tools developed in statisti- 
cal mechanics have found their way into the arsenal of 
methods used in Bayesian inference. 

In the present paper we show that recent progress 
in the statistical mechanics of non-equilibrium processes 



ii 



12] entails new possibilities to estimate the 
prior-predictive value and to average with the posterior 
distribution of a Bayesian analysis. The new method 
relies on the Monte-Carlo (MC) simulation of a non- 
stationary Markov process and is intermediate between 
straight MC sampling and thermodynamic integration 
0- It is generally superior to the first and may also 
outperform the second. We first give a general theoret- 
ical discussion and then analyze numerically two simple 
examples. One of these was used already in [8( to scru- 
tinize the efficiency of thermodynamic integration. The 
second is a generalization thereof employing a bimodal 
likelihood. 



THEORY 

We consider a standard Bayesian inference problem in 
which parameters x of a model A4 are to be determined 



from data d. The prior information about x is coded in 
a prior distribution p p (x\A4), the likelihood of the data 
given certain values of x is denoted by pi{d\x, M). By 
Bayes' theorem the posterior distribution is given by 



p pos t(x\d, M) = 



Pi(d\x 1 M)p p {x\M) 
P{d\M) 



(1) 



Our central quantity of interest is the normalization of 
the posterior, the so called evidence or prior-predictive 
value, defined by 



P{d\M)= dxpi{d\x,M)p p (x\M) . 



(2) 



It quantifies the likeliness of the data for the particu- 
lar model M. under consideration and is therefore crucial 
for the comparison between different models. For the 
following manipulations the dependence oip v (x\M) and 
pi(d\x,A4) on x will be the important one, in order to 
lighten the notation we will therefore suppress the depen- 
dencies on d and M. in these quantities. Generically the 
integral in ([2]) is dominated by intricately shaped regions 
in a high-dimensional space and is therefore difficult to 
determine. 

A possible remedy for this problem is motivated by 
the method of thermodynamic integration used in statis- 
tical mechanics. To this end one introduces the auxiliary 
quantity 



Z((3) := I dx (pi(x)) P p p (x) 



(3) 



Clearly Z(Q) = 1 due to the normalization of the prior 
distribution and Z(\) = P(d\A4), the desired quantity. 
Moreover 

d If 

— hiZ(P) = -^-j J dx\npi(x) pf(x)p p (x) 
= : (\npi(x)) , 

where the /3-average in the last line is taken with the 
distribution 



Pp{x) :-- 



1 



pf(x)p p {x). 



Z(P) 



(4) 
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We therefore get 



In P(d\M) = [ dp^lnZ 
Jo op 



(/?)= / d0(ln Pl (x))p 
o 



(5) 

from which the name thermodynamic integration for the 
method becomes clear. 

Since averages are much more efficiently calculated 
from MC simulations than normalization factors ([5]) of- 
fers a convenient way to determine P{d\M) from equi- 
librium simulations for just a few values of between 
and 1. The method was suggested in Q, its advantages 
over a straight-forward MC estimation of P(d\A4) from 
(J2) was demonstrated for a simple test example in [§] . 

Let us consider the MC simulations necessary to im- 
plement ([5]) in somewhat more detail. We first discretize 
the /^-interval by introducing M values m , m = 1, M, 
with = 0i < 02 < — < 0m — 1. For each m we gener- 
ate a trajectory xt with a discrete time t = 1,2, ... mea- 
sured in MC steps. These trajectories are realizations of 
a Markov process with transition probability p(x, x'; m ) 
which leaves the distribution defined in ((4]) invariant, i.e. 
which satisfies 



Pp m {x)= / dx'p(x,x';0 m )Pp m (x') 



(6) 



A sufficiently long Markov chain is now generated for 
each m in order to get a reliable estimate for (lapi(x))/3 
to be used in ([5]). This equilibration of the system at 
each value of is the main bottleneck of the method. In 
realistic situations with a multi-modal or otherwise com- 
plicated structure of the likelihood it may become very 
slow and special care must be taken which elements of 
the trajectory x t to use for the estimation of the average 
(hipi(x)) p. As a rule specific quantities need to be iden- 
tified and monitored which indicate when the system has 
approximately equilibrated. 

These equilibration problems may be avoided by build- 
ing on recent progress in the statistical mechanics of 
non-equilibrium processes d, Qjl U, H, 13, [l4|. In the 
present context it gives rise to the following procedure to 
determine Z(l). We fix a time interval t = 1, ...,N and 
generate a set of trajectories Xt from a non- stationary 
Markov process such that for each trajectory changes 
from to 1. More precisely we fix M < N intermediate 
time points t m ,m — 1,...,M with 1 < t\ < t% < .... < 
t m < N at which (3 changes by A(3 m = m+ i — (3 m . We 
call the set {A0 m ,t m } of these time points and the cor- 
responding increments in (3 the protocol of the procedure. 

Consider a trajectory {xt} that starts in x\ drawn from 
the prior distribution p p and then evolves according to 
p(x,x';(3 m ) with (3 m fixed by the protocol {A/3 m ,t m }. 
The probability V{{xt}) of the whole trajectory is given 
by 



N-l 



P({ x t}) =Pp(xi) Y\_ P(xt+l,Xu/3r, 



(7) 



Consider now the trajectory dependent functional 

M-l 



(8) 



which is a random quantity due to its dependence on 
{xt}. We will show that its exponential average 

N 

(e R )= / Y[dxtV{{xt}) e R « x *» 
J t=i 

N N-l A/-1 

' JJ dx t p p (xi) JJ p(x t+ i, x t ; m ) JJ (pi(xt m )^j 



A/3„ 



m—1 



is equal to the desired quantity Z(l) = P(d\M). 

To this end we first note that the integrations over the 
first x t with 1 < t < t\ are easily performed since during 
these time steps = 0i = 0. Using ^ repeatedly for 
= we find 

_ti-i ti-i 

/ JJ dx t p p (xi) JJ p(x t+1 ,x t ;0) =p p (x tl ). (9) 
•* t=i t=i 

Together with the m = 1 term in ([5]) and using (0| as 
well as A0i — 2 we hence obtain 



„ ti—i i-i— i 

/ JJ^tPp(^i) Y[ P( x t+i,x t ;0)exp (A/3i lnpi(x tl )) 
J t=i t=i 

= P P (x tl ) pf 2 (x tl ) 
= Z(0 2 )P 02 (x tl ). 

The integrations over the Xt with t\ < t < t% can now be 
performed analogously. According to (|6|) we get at first 

-t 2 -i t 2 -i 

/ Yl dx t Pp 2 (xt 1 ) Y[ P{xt+l,Xt]02) = P(3 2 { x t 2 ) ■ 
J t=ti t=ti 

(10) 

since = 02 for the whole interval. Together with the 
second term of the sum in ([8]) we hence have 

Z(02) Pf3 2 (x t2 ) exp (A02\npi(x t2 )) 
= Pp {x t2 )p^{x t2 )pf^{x t2 ) 

= P P (xt 2 ) pf 3 (a;t 2 ) 
= Z(0 3 )P (33 (x t2 ). 

Iterating this procedure we finally arrive at 

(e*) = Z(0 M ) J dx N P 0M {x N ) = Z(l) = P(d\M) . 

(11) 

Generalizing this relation to continuous protocols 0{t) 
with < t < tf we obtain 

P(d\M) = (cxp(j\t\np l (x(t)) §£0®)} ■ (12) 



1=1 



3 



Eqs. (fTTj) and (fl"2"]) are our central result. The prior- 
predictive value can be determined from an exponential 
average of the quantity R({x t }) defined in © over an en- 
semble of MC trajectories x t generated with a transition 
probability p(x, x'\ /3(t)) that depends explicitly on time 
via the protocol (3(t). Note that this protocol must be 
the same for all trajectories {xt} that are used to deter- 
mine the average (e R ). It is nevertheless very remarkable 
that the results (fTTj) and (112|) respectively do not depend 
on the details of this protocol. 

As a small aside we note that in a Bayesian analysis 
one is usually more interested in averages with the pos- 
terior distribution ([1]) than in this distribution itself. By 
a slight generalization of the above proof one can show 
for the posterior average of some function f(x) 



dx N f(x N )Pf3 =1 (x N ) 
(13) 



(/} P oBt = / dxf(x)p post (x) 
(e R fM) 



where the averages in the last line are taken with V({x t }). 
Posterior averages may hence be determined from path 
averages starting from the prior distribution and incor- 



1 11 ] for an analogous 



porating the weight factor e R , cf. 
result in the statistical mechanics framework. 

Two limiting cases of Eqs. (fTTj) and (|12p are of in- 
terest. For 1 <^ M the system is manipulated in quasi- 
equilibrium and /3 and thus Pp(x) change very slowly. 
Accordingly the Markov chain will explore much of the 
state space for a given small /3-interval and we may there- 
fore replace \npi(x tm ) in © by (lnpi(x)}p m . As a conse- 
quence R no longer depends on the individual realizations 
of the trajectories {xt} and the average in (fl"2)) becomes 
superfluous. Therefore 

P(d\M)=exp( JJ dt(\n Pl (x)) p lfi(t)) 

= exp( / dp(ln Pl (x)) ) (14) 
Jo 

which brings us back to thermodynamic integration, cf. 

In the opposite limit /3 is changed in a single step from 
zero to one at some time t = t*, i.e. (3 — 6{t— t*) with the 
Heaviside ^-function being 1 for positive arguments and 
zero else. In this case the time integral in (fT2j) picks up 
contributions from t — only and the average over the 
trajectories {xt} reduces to the average over x* = x(t*) 
with the prior distribution p p . We then find from (fT2]) 



P(d\M) = J dx*expQnpi(x*))p p (x*), (15) 

which is identical with ([2]). This limit is equivalent to 
what is called thermodynamic perturbation in statistical 
mechanics (l5j . 



The proposed method hence interpolates between the 
two extreme variants ([2]) and ([5]) for the determination of 
the prior-predictive value P(d\M). It should be superior 
to the straight application of {2J since the average in (fl"2]) 
is already shortly after the start of the trajectories influ- 
enced by the likelihood pi(x) which is, as a rule, much 
sharper than the prior. It may outperform thermody- 
namic integration ([5]) since no time-consuming equilibra- 
tions are necessary. On the other hand, the exponential 
average in (fT2|) is known to be subtle. It is biased for 
finite sample sizes 3, lj| 2(| and dominated by rare re- 
alizations with very big values of R for processes too far 



from equilibrium 211 ] . Nevertheless, a strong argument 



in favour of the method is its applicability to systems 
that are hard to equilibrate and its great flexibility to 
optimization for which the whole protocol (3{t) is at our 
disposal. 

It is also worthwhile to mention that in many situations 
of interest the probability distribution of R is Gaussian 
with average (R) and variance a\ [l|| 17 1. In this case 



the average (e R ) can be calculated exactly with the result 



P{d\M) = cxp l(R) 



2 



(16) 



The determination of (R) and u\ from the MC simu- 
lations is, of course, much less demanding than the ex- 
traction of the complete distribution of R. For general 
distributions (|16p gives just the first two terms of the 
cumulant expansion. 



TWO SIMPLE EXAMPLES 

We now numerically investigate the performance of the 
method by applying it to two simple, exactly solvable 
examples. The first one employs a unimodal likelihood 
the second one uses a bimodal one. These model sys- 
tems are variants of those discussed in [§] and 22 1 re- 



spectively. The underlying inference problem is to esti- 
mate an n-dimensional vector x of parameters from an 
n-dimensional vector d of data with n — 128. 

For the unimodal case both prior p p {x) and likelihood 
pf(x) are taken to be Gaussians with zero mean and vari- 
ances <jp and of respectively. 

pf(x) = (2Wr" /2 cxp(-M_^) (17) 



2<7. 



p p (x) = (27r^)-«/2 exp(-^L) (18) 

We will use a p = 10 and o~i = 1 to model the typical 
situation in which the prior is substantially broader than 
the likelihood. Moreover wc choose a data vector d with 



i " 

<? = -£^ = 100 > 



(19) 



i=l 



4 



which on the one hand ensures that the data are far from 
the center of the prior p p (x), and on the other hand fixes 

the signal-to-noise ratio SNR= \J d 2 / erf to be 10 as in 

Q . The performance of the algorithms to be discussed 
below does not depend on the particular values of the 
di as long as d 2 = 100 is fulfilled. We therefore use the 
simple prescription di — 10 for all i = 1 . . . n. 

In the bimodal case only the likelihood differs which is 
given by 

1 90 

^) = -pjt(x) + -^(-x) (20) 

and hence consists of two Gaussians with the same vari- 
ance centered at x and —x respectively. It is important, 
however, that their relative weights are markedly differ- 
ent from each other. Our special choice of parameters 
implies that in equilibrium the region around —x should 
be sampled 20 times as often as the one around x. 

With the choices for prior and likelihood given above 
the prior-predictive value as defined in @ can be calcu- 
lated analytically. We get for both cases the result 

P(d\M) = (27r(a? + ^))-/ 2 cxp(- 2( Jj^| 2 ) . (21) 

With the parameter values chosen this yields 
\nP{d\M) = -476.358. 

In the following we will test thermodynamic integra- 
tion and our procedure against the exact result. In order 
to present a fair comparison between the performances of 
the numerical methods each simulation will comprise the 
same number (10 9 ) of MC steps. This number will, how- 
ever, be divided in different ways between the number M 
of intermediate (3- values, the number N of MC steps per 
/3-value for thermodynamic integration, (the number TV 
of MC steps per run for the exponential average) , and 
the number 7V C of runs to estimate the distribution of R, 
(to estimate the distribution of lnPtd). 

The thermodynamic integration scheme ((5|) requires 
estimates of (hipi(x)) p m at appropriately chosen values 
(3 m of (3. As a rule (\npi(x))p is a smooth function of (3 
and few such values will be sufficient. Trajectories {xt} 
are generated for each (3 m by the standard Metropolis 
algorithm. First a starting point x\ is chosen at random, 
e.g. from the prior distribution p p or taken directly from 
the endpoints at the previous f3 m . Subsequent moves for 
t = 2, ...,(N — 1) are obtained by generating a trial step 
x t — > x' from the distribution 

The step is accepted, x t+ \ = x', with probability 

Pacccpt = min[l, Pp m {x')/P Pm (x t )} ■ (23) 



and rejected, Xt+i — Xt, otherwise. The choice 
c s tep(/3) = 0.25(l/cTp + /3/of )~ x / 2 ensures a good accep- 
tance ratio for all (3 since it adapts to the width of Pp (x) 
as given in The first steps in the trajectories {x t } 
are not yet characteristic for the equilibrium distribution. 
We therefore discard the first 60% of steps for equilibra- 
tion. The rest is thinned out by discarding all but every 
tenth step to suppress correlations. From the remaining 
values the average (\npi(x))p is determined. We then 
calculate \nP(d\M) by integrating a cubic spline inter- 
polation of |5]). The procedure is repeated N c times to 
obtain an average of the log prior- predictive value, lnPtd, 
together with an error estimate. 

For the simulation of (Ti"2"| we have first to define the 
protocol (3(t) according to which the parameter (3 will be 
changed from to 1. We will use three different protocols 
with tf = 1. Introducing M equidistant time points 

Un = m ■ -^z , m = 1, ...,M 
these protocols are defined respectively by (cf. figfT]) 




0.2 0.4 0.6 0.8 1 

m/M 

FIG. 1: Plot of the three protocols used in the simulations. 
The dotted line shows the linear protocol (I24|l . the dashed- 
dotted one the exponential protocol (|26[) . and the full one the 
polynomial protocol defined by (|25[1 . 

The trajectories {xt} are generated by the Metropolis 
algorithm in a similar way as in the simulation of ther- 
modynamic integration, with, however, a few crucial dif- 
ferences. First, the number M of intermediate /3-values 
is much higher now. Second, (3 and therefore the ac- 
ceptance probability (|23p changes along the trajectory. 
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FIG. 2: Histogram of the logarithm of the prior-predictive 
value for the unimodal case as obtained from thermodynamic 
integration using M = 100 intermediate /3-values determined 
according to J25|, N = 10000 MC steps per f3 value and 
N c — 1000 runs. The dotted vertical line indicates the average 
In P td = -476.386 ± 0.05, the full one the exact result In P = 
-476.358. 

Third, the starting point x\ must now be sampled from 
the prior p p (x). Fourth, no equilibration is necessary and 
hence no points will be discarded. 

At each moment when (3 changes a new contribution 
is added to R according to ©. After N c trajectories 
have been simulated, a histogram of i?-values is gener- 
ated from which the average (R), the variance (Jr and 
the exponential average ln(e fl ) together with an error 
estimate are calculated. 



RESULTS 
Unimodal likelihood 

Representative results for the unimodal model from 
thermodynamic integration simulations are shown in 
figs E] and [3] as histograms for \nP(d\M.) together with 
their averages and the exact result (|2l"j) . The intermedi- 
ate values of (3 where chosen according to but this 
is not very crucial. As can be seen a very good estimate 
of the prior-predictive value may be obtained. 

From we infer that the intermediate distributions 
Pp(x) are all Gaussians in this case and equilibration 
is hence no problem. This is also corroborated by the 
comparison between figs H] and [3] which show that a few 
very long trajectories do not yield substantially better 
results than many long trajectories. Accordingly ther- 
modynamic integration works well. 

Results from simulations of (fl"!?|) for the unimodal case 
are shown in figs[4][7] 

FigslUGIl and [5] highlight the influence of the protocol 
(3(t), all other parameters are the same. As can be seen 



FIG. 3: Same as fig[2]with M = 20, N = 500000, and N c = 
100 resulting in lnP td = -476.349 ± 0.04. 
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FIG. 4: Histogram of 7?-values for the unimodal case as ob- 
tained by simulating (|12[) for the polynomial protocol (|25[) 
using N c — 1000 trajectories with N = M — 10 6 steps each. 
The vertical lines show the mean (R) = —476.876 (dotted), 
the estimate ln(e H ) = -476.362 ± 0.05 (dashed-dotted), and 
the exact result InP = —476.358 (full). The variance of the 
histogram is given by or — 1.0 . 
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FIG. 5: Same as fig(4]for the exponential protocol (|26l) . In 
this case (R) = -478.226, a R = 1.9, and ln(e fl ) = -476.574± 
0.09. 
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FIG. 6: Same as fig|4] for the linear protocol (I24|l yielding 
(R) = -479.308, a R = 2.4, and ln(e fl ) = -476.384 ±0.3. 



the i?-distributions produced are characterized by differ- 
ent mean values (R) and different widths ur- For /3 poly (t) 
we get the narrowest and for P lln (t) the widest distribu- 
tion. Nevertheless the estimate \n(e R ) barely changes 
and is for all three cases rather near to the exact value. 
Lower values of (R) and larger ones for <jr compensate 
each other (cf. (|16[) ) leaving the estimate for the prior- 
predictive value almost the same. Still, as far as the error 
in the estimate is concerned, a narrow distribution of R 
is advantageous and correspondingly /3 poly (i) performs 
best. 

Making the trajectories longer, decreases the variance 
in R further as can be seen in fig|7]but leaves less realiza- 
tions N c for the exponential average (e R ). For the present 
case, however, the estimate for the prior-predictive value 
remains reliable. 



20 



15 



10 



n .nfl 



Dn 



_o 



-477.5 



-477 



-476.5 
R 



-476 



-475.5 



FIG. 7: Same as fig|4] using again the polynomial protocol 
([25]) but now with N c = 100 runs consisting of M = N = 10 7 
steps each. The results are now (R) = —476.41, or = 0.3, 
and lr^e*) = -476.370 ± 0.03. 



In the simulations we have observed that our method 
gives best results for many intermediate values of (3. 



Therefore we have chosen for M the maximal possible 
number, M = N, implying that (3 is changed after each 
MC-step. From the discussion around eq. ([T3J) this means 
that we are using our method in a regime where it is very 
similar to thermodynamic integration. This makes sense: 
for simple situations without equilibration problems ther- 
modynamics integration works fine and our more general 
method yields comparable results for protocols which are 
near to a quasi-static process. 



Bimodal likelihood 

The results obtained from thermodynamic integration 
for the bimodal case are displayed in figs. [5] and O It is 
clearly seen that the good performance of the unimodal 
case is not reached. The estimate for the prior-predictive 
value differs substantially from the exact value. This fail- 
ure may be traced back to the incomplete equilibration 
between the two maxima of the likelihood. In the be- 
ginning of the simulation starting with the prior which 
is symmetric around x = the regions around x = — 1 
and x = 1 are populated by the MC trajectories with 
roughly the same density. Later in the simulation transi- 
tions between the regions are extremely rare and conse- 
quently the different prefactors in (|20|) are not properly 
reproduced. It is this incomplete equilibration which is 
typical for multimodal distributions that impede a sat- 
isfactory performance of thermodynamic integration. As 
shown in fig[9] this failure cannot be mitigated by using 
longer trajectories since the equilibration over the barrier 
at x — is simply too slow. 
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FIG. 8: Histogram of the logarithm of the prior-predictive 
value for the bimodal case as obtained from thermodynamic 
integration. Parameters and meaning of the lines are the same 
as in fig.©. The result is now lnP td = -477.34 ± 0.07 the 
exact value is still lnP = —476.358 



Results for the simulation of (|12|l for the bimodal case 
are displayed in figsfTU] and [TT] As can be seen the ac- 
curacy of the estimates for the prior-predictive value are 
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FIG. 9: Same as fig(8]with parameters as in fig(3] The result FIG. 11: Same as fig llOl with N c — 1000 runs consisting of 
is InPtd = -477.338 ± 0.16. M = 10 5 /3-steps and N = 10 6 steps all together resulting in 

(7?) = -477.74, a R = 0.06, and ln(e fl ) = -476.371 ± 0.06. 



much better than those from thermodynamic integration. 
In fact the quality of the results is comparable to those 
for the unimodal case. This may seem surprising since 
the bimodal structure of the histogram of R clearly in- 
dicates that the realizations from the MC simulation are 
again captured by one of the two maxima of the likeli- 
hood. However, the weight factor e R differs for the two 
subsets of trajectories in exactly such a way as to pro- 
duce the correct prefactors in front of the two parts of 
the posterior distribution, cf. (fl~3|) . As a consequence a 
precise estimate of the prior-predictive value can be ob- 
tained although no final equilibration was reached. In 
this situation our method is hence superior to thermody- 
namic integration. 
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FIG. 10: Histogram of i?-values for the bimodal case as ob- 
tained by simulating (|12|l for the polynomial protocol (|25|l us- 
ing N c = 100 runs consisting of M = 10 6 /3-steps and N = 10 7 
steps altogether. The results are {R) = —477.26, <t_r = 0.15, 
and \n{e R ) = -476.38 ± 0.10. 



DISCUSSION 

In the present note we have introduced a new method 
to numerically determine the prior-predictive value in a 
Bayesian inference problem from MC simulations. Our 
method derives from a variant of the Jarzynski equation 

i 

(e-P w ) = e -? AF (27) 

that allows to determine the free energy difference AF 
between two equilibrium states of a system at inverse 
temperature (5 from an exponential average of the work 
W done in a non- equilibrium transition between the two 
states. In statistical mechanics this equation has been 
used already to find differences in free energy from fast- 
growth simulations [ijj ]. 

The method proposed in the present paper incorpo- 
rates two approaches as limiting cases that are used al- 
ready to determine the prior-predictive value, namely 
straight MC estimation and thermodynamic integration. 
The method was shown to work well in a simple unimodal 
example in which its efficiency was comparable with ther- 
modynamic integration. It proved to be superior in the 
bimodal example. Our numerical implementation of both 
algorithms is not optimal. The amount of samples dis- 
carded in thermodynamic integration certainly can be 
reduced and the protocol /3 poly for our procedure is not 
very sophisticated. 

However, the example chosen with Gaussians for both 
prior and likelihood is rather remote from real applica- 
tions so that a fine-tuning of the procedures for this spe- 
cial case seems to be somewhat ill-advised. A compari- 
son of the methods when applied to a more realistic setup 
with the complications alluded to in the introduction and 
when implemented in a more optimal way is left for fu- 
ture work. 
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The main advantage of the new method presumably 
lies in its applicability to multimodal systems that resist 
naive equilibration approaches, and its great flexibility 
parametrized by a protocol function (3(t) which may be 
adapted to the particular problem under consideration. 
We therefore hope that the method will provide a useful 
extension of the box of tools available to perform model 
selection in the framework of Bayesian data analysis. 

Acknowledgment: We are indebted to 

Prof. Dr. Volker Dose for many stimulating discus- 
sions. 



* Electronic address: ahlers@theorie.physik.uni-oldenburg.de 
' Electronic address: engel@theorie.physik.uni-oldenburg.de 
[1] Jaynes E. T., Probability Theory: The Logic of Science 

(Cambridge University Press, Cambridge, 2003) 
[2] Gelman A., Carlin J. B., Stern H. S., Rubin D. B., 
Bayesian Data Analysis (Chapman and Hall, London, 
1995) 

[3] Leonhard T. and Hsu J. S. J., Bayesian Methods: 
An Analysis for Statisticians and Interdisciplinary Re- 
searchers (Cambridge University Press, Cambridge, 
1999) 

[4] Bernardo J. M. et al. (eds.) Bayesian Statistics 7 (Oxford 

University Press, Oxford, 2003) 
[5] D'Agostini C, Rep. Prog. Phys. 66, 1383 (2003) 



[6] Dose V., Rep. Prog. Phys. 66, 1421 (2003) 
[7] R. Neal, Probabalistic Inference using Markov Chain 
Monte-Carlo Methods, Dept. of Computer Science, Uni- 
versity of Toronto, 1993 
[8] von der Linden W., Preuss R., and Dose V., The prior- 
predictive value: A paradigm of nasty multi- dimensional 
integrals in Maximum Entropy and Bayesian Methods, 
von der Linden W. et al. (eds.) (Kluwer, Dordrecht, 1999) 
[9] Jarzynski O, Phys. Rev. Lett. 78, 2690 (1997) 
[10] Jarzynski O, J. Stat. Phys. 98, 77 (2000) 
[11] Crooks C, Phys. Rev. E61, 2361 (2000) 
[12] Seifert U., Phys. Rev. Lett. 95, 040602, (2005) 
[13] J. L. Lebowitz and H. Spohn, J. Stat. Phys. 95, 333 
(1998) 

[14] Chatelain O, Temperature extended Jarzynski relation: 
Application to the numerical calculation of the surface 
tension", |cond-mat /0702044 
[15] Zwanzig R., J. Chem. Phys. 22, 1420 (1954) 
[16] Park S. and Schulten K., J. Chem. Phys. 120, 5946 
(2004) 

[17] T. Speck and U. Seifert, Phys. Rev. E70, 066112 (2004) 
[18] Fox R. F., Proc. Natl. Acad. Sci. USA 100, 12537 (2003) 
[19] Gore J., Ritort F., and Bustamante O, Proc. Natl. Acad. 

Sci. USA 100, 12564 (2003) 
[20] Hummer C, J. Chem. Phys. 114, 7330 (2001) 
[21] Jarzynski, O, Phys. Rev. E73, 046105 (2006) 
[22] M. Daghofer, M. Konegger, H . G. Evertz, and W. von 

der Linden, Perfect Tempering arXiv:physics/0512167 
[23] D. A. Hendrix and C. Jarzynski, J. Chem. Phys. 114, 

5974 (2001) 



